A Mean Deviation Based Splitting Criterion for Classification Tree
نویسندگان
چکیده
For the learning of Classification Tree, many researchers have used different splitting criteria, in which most commonly impurity-based criteria are: Gini index, Entropy function and Exponent-based index. By comparing Misclassification rates, none of the splitting criterion can be declared as providing best results in every situation. In this study, a new Mean Deviation based index has been proposed and a simulation study is designed, to explore which measure gives best result in what type of situation? From simulation study, it is concluded that generally new proposed M.D-based index and Exponent-based index give excellent results in case of imbalanced data. While, in case of balanced data, Gini index and Entropy function have less misclassification rates. Keywords— "Classification Tree", "Misclassification rate", "Impurity-based criteria", "Simulation", "Imbalanced data"
منابع مشابه
Assessing Behavioral Patterns of Motorcyclists Based on Traffic Control Device at City Intersections by Classification Tree Algorithm
According to the forensic statistics, in Iran, 26 percent of those killed in traffic accidents are motorcyclists in recent years. Thus, it is necessary to investigate the causes of motorcycle accidents because of the high number of motorcyclist casualties. Motorcyclists' dangerous behaviors are among the causes of events that are discussed in this study. Traffic signs have the important role of...
متن کاملFamilies of splitting criteria for classification trees
Several splitting criteria for binary classification trees are shown to be written as weighted sums of two values of divergence measures. This weighted sum approach is then used to form two families of splitting criteria. One of them contains the chi-squared and entropy criterion, the other contains the mean posterior improvement criterion. Both family members are shown to have the property of ...
متن کاملThe Comparison of Gini and Twoing Algorithms in Terms of Predictive Ability and Misclassification Cost in Data Mining: An Empirical Study
The classification tree is commonly used in data mining for investigating interaction among predictors, particularly. The splitting rule and the decision trees technique employ algorithms that are largely based on statistical and probability methods. Splitting procedure is the most important phase of classification tree training. The aim of this study is to compare Gini and Twoing splitting rul...
متن کاملComparison of Machine Learning Algorithms for Broad Leaf Species Classification Using UAV-RGB Images
Abstract: Knowing the tree species combination of forests provides valuable information for studying the forest’s economic value, fire risk assessment, biodiversity monitoring, and wildlife habitat improvement. Fieldwork is often time-consuming and labor-required, free satellite data are available in coarse resolution and the use of manned aircraft is relatively costly. Recently, unmanned aeria...
متن کاملComparing different stopping criteria for fuzzy decision tree induction through IDFID3
Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...
متن کامل